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1. Introduction 

The rapid development and wide application of computer techniques permits to collect and store a 
huge amount data, where the number of measured variables is usually large. Such high dimensional 
data occur in many modern scientific fields, such as micro- array data in biology, stock market 
analysis in finance and wireless communication networks. Traditional estimation or test tools are 
no more valid, or perform badly for such high-dimensional data, since they typically assume a large 
sample size n with respect to the number of variables p. A better approach in this high-dimensional 
data setting would be based on asymptotic theory which has both n and p approaching infinity. 
To illustrate this purpose, let us mention the case of Hotclling's T 2 -test. The failure of T 2 -test 
for high-dimensional data has been mentioned as early as by Dempster (1958). As a remedy, 
Dempster proposed a so-called non-exact test. However, the theoretical justification of Dempster's 
test arises much later in Bai and Saranadasa (1996) inspired by modern random matrix theory 
(RMT). These authors have found necessary correction for the T 2 -test to compensate effects due 
to high dimension. 

In this paper, we consider two LR tests concerning covariance matrices. We first give a theoret- 
ical explanation for the fail of these tests in high-dimensional data context. Next, with the aid of 
random matrix theory, we provide necessary corrections to these LR tests to cope with the high 
dimensional effects. 

First, we consider the problem of one-sample covariance hypothesis test. Suppose that x follows 
a p-dimensional Gaussian distribution N(/j, p , E p ) and we want to test 

H o : S p = I p , (1.1) 

where I p denotes the p-dimensional identity matrix. Note that testing E p = A with an arbitrary 
covariance matrix A can always be reduced to the above null hypothesis by the transformation 

Let (xi, • • • , x„) be a sample from x, where we assume p < n. The sample covariance matrix is 

S = iV(x i -x)(x i -x)* ) (1.2) 

and set 

L* = trS - log | S | - p . (1.3) 

The likelihood ratio test statistic is 

T n = n-L*. (1.4) 

Keeping p fixed while letting n — > oo, then the classical theory depicts that T n converges to the 
y 2 i , , 1N distribution under Hn. 
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However, as it will be shown, this classical approximation leads to a test size much higher than 
the nominal test level in the case of high-dimensional data, because T n approaches infinity for 
large p. As seen from Table 1 in §3, for dimension and sample sizes (p, n) = (50, 500), the realized 
size of the test is 22.5% instead of the nominal 5% level. The result is even worse for the case 
(p, n) = (300, 500), with a 100% test size. 

Based on a recent CLT for linear spectral statistics (LSS) of large-dimensional sample covariance 
matrices (Bai and Silverstein , 2004), we construct a corrected version of T n in §3. As shown by 
the simulation results of §3.1, the corrected test performs much better in case of high dimensions. 
Moreover, it also performs correctly for moderate dimensions like p = 10 or 20. For dimension 
and sample sizes (p, n) cited above, the sizes of the corrected test are 5.9% and 5.2%, respectively, 
both close to the 5% nominal level. 

The second test problem we consider is about the equality between two high-dimensional 
covariance matrices. Let x 4 = (xu, x 2i , • • • ,x pi ) T ,i = 1, ••• ,n x and y 3 = {yi v y 2v --- ,y P j) T , 
j = 1, • ■ • ,n 2 be observations from two p-dimcnsional normal populations N(/ik, k = 1,2, 
respectively. We wish to test the null hypothesis 

H : Si = S 2 • (1.5) 

The related sample covariance matrices are 

1 " 1 . 1 " 2 _ 

- - x)(x, - x)*, B = — £(y, - y)(y. t - y)* , 



"I 

A 



ni'r-r n 2 . 

i—i i=i 

where x , y are the respective sample means. Let 

\A\^-\B\^ 

Li = jt, (1.6) 

\ Cl A + c 2 B\ 2 

where N = ri\ + n 2 and Ck denote k = 1,2. The likelihood ratio test statistic is 

T N = -21ogLi, 

and when ri\, n 2 — >• oo, we get 

^ = -2108^^x1^+!) (1-7) 

under Hq. Of cause, in this limit scheme, the data dimension p is held fixed. 

However, employing this \ 2 limit distribution for dimensions like 30 or 40, increases dramatically 
the size of the test. For instance, simulations in §4.1 show that, for dimension and sample sizes 
(p, rix, n 2 ) = (40, 800, 400), the test size equals 21.2% instead of the nominal 5% level. The result is 
worse for the case of (p, nx,n 2 ) = (80, 1600, 800), leading to a 49.5% test size. The reason for this 
fail of classical LR test is the following. Modern RMT indicates that when both dimension and 
sample size are large, the likelihood ratio statistic Tjv drifts to infinity almost surely. Therefore, 
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the classical x 2 approximation leads to many false rejections of Hq in case of high-dimensional 
data. 

Based on recent CLT for linear spectral statistics of F-matrices from RMT, we propose a 
correction to this LR test in §4. Although this corrected test is constructed under the asymptotic 
scheme ni A ni — > +00, y ni — p/ni — > y\ G (0,1), y n . 2 = p/n,2 -^2/26 (0, 1), simulations 
demonstrate an overall correct behavior including small or moderate dimensions p. For example, 
for the above cited dimension and sample sizes (p, 711,712), the sizes of the corrected test equal 
5.6% and 5.2%, respectively, both close to the nominal 5% level. 

Related works include Lcdoit and Wolf (2002), Srivastava (2005) and Schott (2007). These 
authors propose several procedures in the high-dimensional setting for testing that i) a covariance 
matrix is an identity matrix, proportional to an identity matrix (spherecity) and is a diagonal 
matrix or ii) several covariance matrices are equal. These procedures have the following common 
feature: their construction involves some well-chosen distance function between the null and the 
alternative hypotheses and rely on the first two spectral moments, namely the statistics trS^ and 
trS? from sample covariance matrices Sk- Therefore, the procedures proposed by these authors 
are different from the likelihood-based procedures we consider here. Another important difference 
concerns the Gaussian assumption on the random variables used in all these references. Actually, 
for testing the equality between two covariance matrices, the correction proposed in this paper 
applies equally for non-Gaussian and high-dimensional data leading to a valid pseudo-likelihood 
test. 

The rest of the paper is organized as following. Preliminary and useful RMT results are recalled 
in §2. In §3 and §4, we introduce our results for the two tests above. Proofs and technical derivations 
are postponed to the last section. 

2. Useful results from the random matrix theory 

We first recall several results from RMT, which will be useful for our corrections to tests. For any 
pxp square matrix M with real eigenvalues (Af f ) , F^f denotes the empirical spectral distribution 
(ESD) of M, that is, 

1 p 

p 1=1 

We will consider random matrix M whose ESD F^' 1 converges (in a sense to be precised ) to a 
limiting spectral distribution (LSD) F M . To make statistical inference about a parameter 8 = 
J f(x)dF M (x), it is natural to use the estimator 

6= ff(x)dF^(x) = lf2f(Xf), 
which is a so-called linear spectral statistic (LSS) of the random matrix M. 
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2.1. CLT for LSS of a high- dimensional sample covariance matrix 

Let £ C, i, k = 1, 2, ■ • • } be a double array of i.i.d. complex variables with mean and variance 
1. Set £j = £,2i, ■ ■ ■ , £,pi) T j the vectors (£i , • • • , £ n ) is considered as an z.z.d sample from some p- 
dimensional distribution with mean P and covariance matrix I p . Therefore the sample covariance 
matrix is 



1 

s»=-y>£. (2-i) 



n 

i=l 



For < 6> < 1, let a{9) = (1 - Vfl) 2 and 6(0) = (1 + VO) 2 . The Marccnko-Pastur distribution 
of index 9, denoted as F e , is the distribution on [a(6), b(0)] with the following density function 

ge{x) = -^-y/[b(e)-x][x-a(e)], a(8)<x<b(9). 



Let 



Vn = -^!/6(0, 1) 
n 



and F v ,F Vn be the Marcenko-Pastur law of index y and y„, respectively. Let U be an open set of 
the complex plane, including [/(o,i) (y)a(y), b{y)\, and A be the set of analytic functions / : U t— ► C. 
We consider the empirical process G n := {G„(/)} indexed by .4 , 

/■+OC 

<?„(/) = p • / /(or) [F„ - (dx), f e X, (2.2) 



where -F ra is the ESD of S n . The following theorem will play a fundamental role in next derivations, 
which is a specialization of a general theorem from Bai and Silverstein (2004) (Theorem 1.1). 

Theorem 2.1. Assume that fi,-- - ,fk £ -A, and {£ij} are i.i.d. random variables, such that 

E£ n = 0, £|£n| 2 = 1, £|£n| 4 < oo. Moreover, £ -> j/ e (0, 1) as -> oo. 

T/ien: 

(«J i?eaZ Case. Assume fl^e rea^ and E^^) = 3. TTien i/ie random vector (G ra (/i), • • • , G n (fk)) 
weakly converges to a k- dimensional Gaussian vector with mean vector, 

mUj)= ^m±Mm .^r m ix , j= ,, 3) 

4 2tt y a(v) ^ - (x - 1 - y) 2 

and covariance function 

v(fjJe) =-T2<f{ - ( f / { * l)M * 2 \. 2 dm( Zl )dm(z 2 ), j,£ e {1, ■ ■ ■ , k} (2.4) 
27i - z J J (rn(zi) - m(z2)r 

where m(z) = mpv{z) is the Stieltjes Transform of F v = (1 — j/)/ro j0 c) +yF v - The contours in 
(2.4) are non overlapping and both contain the support of F y . 

(ii) Complex Case. Assume {£y} are complex and E^ = , -E(|£n| 4 ) = 2. Then the 
conclusion of (i) also holds, except the mean vector is zero and the covariance function is half of 
the function given in (2.4)- 
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It is worth noticing that Theorem 1.1 in Bai and Silverstein (2004) covers more general sam- 
ple covariance matrices of form S' n = T}J 2 SnTn" 1 where (T n ) is a given sequence of positive- 
definite Hermitian matrices. In the "white" case T n = I as considered here, in a recent preprint 
Pastur and Lytova (2008), the authors offer a new extension of the CLT where the constraints 
i?|£n| 4 — 3 or 2, as stated above, are removed. 

2.2. CLT for LSS of high- dimensional F matrix 

Let {£ki € C, i, k = 1,2,- ••} and {77^ <G C,j,k = 1,2,---} are two independent double ar- 
rays of i.i.d. complex variables with mean and variance 1. Write £j = (^u,^2i,' ■ ■ ,£,pi) T and 
rjj = (r)ij, r)2j, ■ ■ ■ , ?7 P j ) T - Also, for any positive integers ni,7i2, the vectors (£i,--- ,£, ni ) and 
(?7i, • • • ,r] n2 ) can be thought as independent samples of size n\ and n2, respectively, from some 
p-dimcnsional distributions. Let Si and S2 be the associated sample covariance matrices, i.e. 

j 11 j "2 

Si = — V and S 2 = — V VjVj 

1=1 j=i 

Then, the following so-called F-matrix generalizes the classical Fisher-statistics for the present 
p-dimensional case, 

V n = S1S2 1 (2.5) 

where ri2 > P- Here we use the notation n = (m, r^). 
Let 

Vm = — > j/i € (0, 1), y n2 = — ^ y 2 £ (0, 1). (2.6) 

ni n 2 

Under suitable moment conditions, the ESD of V n has a LSD F yi y2 , which has a density [See 
P72 of Bai and Silverstein (2006)], given by 



(l-yaVft-sKx-o^ 
*(x) = < =Mw + »*0 (2 . 7) 

0, otherwise. 

where a = (1 - y 2 )~ 2 (1 - V2/1 + 2/2 - 2/12/2) 2 and 6 = (1 - j/ 2 ) -2 (1 + Vj/i + 2/2 - yil/2) 2 ■ 
Similar to previously, let 1A be an open set of the complex plane, including the interval 

~|2 (1 1 /rrrN2- 



(1 + y/ylr (l - V2/2) 



and A be the set of analytic functions /:1/hC. Define the empirical process G n := {G n (f)} 
indexed by A 

/+00 ^ 
f{x)[F^-F yni>Vn2 ]{dx), ft A. (2.8) 
-OO 

Here Fy ni ,y n2 is the limiting distribution in (2.7) but with y nk instead of k = 1,2. 
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Recently, Zheng (2008) establishes a general CLT for LSS of large-dimensional F matrix. The 
following theorem is a simplified one quoted from it, which will play an important role. 

Theorem 2.2. Let /i, • • • , fk & A, and assume: 

For each p, (^jj and (rjij 2 ) variables are i.i.d., 1 < i < p, 1 < j\ < n\, 1 < 32 < E£n = 



Er]n=0, E\t u \ 
Then 



Elm^Koo, fe = ^!/ie(o,i), yn 2 = ^^y 2 e(o,i). 



(%) Real Case. Assume (£jj) and (f]ij) are rea/, 



u 



1, iften ifte random vector 



G„ (/i), • • • , G n (fk) ) weakly converges to a k-dimensional Gaussian vector with the mean vector 
m (fj) 



lim [(2.9) + (2.10) + (2.11)} 

1 

4™ J lCl = 1 

/3-yi(i-y 2 ) 2 



fMO) 



C- 



c + 



2tTJ • ft 2 

^•y2(l-?/2) 
27TI • ft, 



ICI=i 



&(*(0) 



(C + §) 3 



' /ir 



<K 



Kl=i U + sfJ 



i = 1, ■ 



(2.9) 
(2.10) 
(2.11) 



wftere z(C) = (1 - y 2 )~ 2 [l + h 2 + 2ftft(C)] , ft = y/yi + y 2 - J/i2/ 2 , /? = £|£n| 4 - 3, and tfte 
covariance function as 1 < ri < 7*2 J, 1 



"(/,-,//) = 
1 

~2^2 



lim p.i£) + (2J3))] 

Kn <r 2 ^l + 



fi i z ( r i Ci ) ) fe(z ( r 2 (2 ) )n r 2 



|C2| = U|Cl|=l O2C2 - nCi) 2 

/?-(yi + y 2 )(i-y 2 ) 2 £ &(*(&)) 



d(id(; 2 



47T 2 ft 2 

{1, ••■,*}. 



■ft MC2)) 

,1=1 (C2 + ^) 5 



dC 2 



(2.12) 
(2.13) 



(zi,) Complex Case. Assume (£,ij) and (rjij) are complex, E(^f^) = E(r)fi) = 0, then the conclu- 
sion of (i) also holds, except the means are lim [(2.10) + (2.11)] and the covariance function is 

r-»l+ 



lim 

Ki'i <r 2 — >1+ 



- • (2.12) + (2.13) 



where (3 = £|£n| 4 - 2. 



We should point out that Zheng's CLT for F-matrices covers more general situations then those 
cited in Theorem 2.2. In particular, the fourth-moments -E|£n| 4 and -E|?7ii| 4 can be different. 

The following lemma will be used in §4 for an application of Theorem 2.2 to obtain the formula 
(4.5) and (4.6). 

Lemma 2.1. For the function f(x) = log(a + bx), x G R, a, b > 0, let (c,d) be the unique 
solution to the equations 

' c 2 + d 2 = a(l - y 2 ) 2 + 6(1 + ft 2 ), 
cd = bh, 
< d < c. 
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Analogously, let 7, r\ be the constants similar to (c, d) but for the function g(x) = log(a+/3x), a > 
0, f3 > 0. Then, the mean and covariance functions in (2.9) and (2.12) equal to 

1 (c 2 - d 2 )h 2 



m 



(/) = olog 



2 b (ch-y 2 d) 2 ' 



v(J,g) = 2bhd- 1 c' 1 log 



C7 — c?r/ 



3. Testing the hypothesis that a high-dimensional covariance matrix is equal to a 
given matrix 

To test the hypothesis Hq : S p = I Pl let be the sample covariance matrix S and likelihood ratio 
statistic T n as defined in (1.2) and (1.4), respectively. For £j = x, — fi p , the array {£i}»=i ... , n 
contains p-dimensional standard normal variables under Hq. Let 

1 " 

n — ' 

and 



n 

i=l 



L* = trS n - log|S n | - p. 

Theorem 3.1. Assuming that the conditions of Theorem 2.1 hold, L* is defined as (1.3) and 
g(x) — x — log a; — 1. Then, under Hq and when n — > 00 

T n = v(g)-i [L*-p- F Vn (g) ~ m(g)} N (0, 1) , (3.1) 

where F Vn is the Marcenko-Pastur law of index y n . 

Proof. Because the difference between S and S„ is a rank-1 matrix, S and S„ have the same LSD. 
So, L* and L* have the same asymptotic distribution. We also have 

L* = trS„ -log|S„| -p 

P r 
= E ( A "" - lo § A "" - !) = P ' / ( x ~ l °S x - VdF n (x) 

= v [ g(x)d(F n (x)-Fy-(x))+p-F^(g), 



so that 

G n {g) = Z*-p-Fy-{g). (3.2) 

By Theorem 2.1, G n (g) weakly converges to a Gaussian vector with the mean 

log (1 - y) 

m(g) = g (3.3) 

and variance 

u( fl ) = -21og(l-y)-2y. (3.4) 
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for the real case, which are calculated in §5. For the complex case, the mean m(g) is zero and the 
variance is half of v(g). Then, by (3.2) we arrive at 

If-p-F^ig) N(m(g),v(g)), (3.5) 

where 

F vn (g) = 1 _y^zl log{1 _ yn) ( 3. 6) 

Vn 

can be calculated by the density of LSD of sample covariance matrix in §5. Because L* and L* 
have the same asymptotic distribution and (3.5), finally we get 

% = v(g)-$ [L*-p- F*»{g) - m(g)] N (0, 1) . 

□ 

3.1. Simulation study I 

For different values of (p, n) , we compute the realized sizes of traditional likelihood ratio test (LRT) 
and the corrected likelihood ratio test (CLRT) proposed previously The nominal test level is set 
to be a = 0.05, and for each (p, n), we run 10,000 independent replications with real Gaussian 
variables. Results are given in Table 1 and Figure 1 below. 







CLRT 




LRT 


(Pi n ) 


Size 


Difference with 5% 


Power 


Size 


Power 


(5, 500) 


0.0803 


0.0303 


0.6013 


0.0521 


0.5233 


(10, 500) 


0.0690 


0.0190 


0.9517 


0.0555 


0.9417 


(50, 500) 


0.0594 


0.0094 


1 


0.2252 


1 


(100, 500) 


0.0537 


0.0037 


1 


0.9757 


1 


(300, 500) 


0.0515 


0.0015 


1 


1 


1 



Table 1 

Sizes and powers of the traditional LRT and the corrected LRT, based on 10,000 independent applications with 
real Gaussian variables. Powers are estimated under the alternative E p = diag(l, 0.05, 0.05, 0.05, . . .). 

As seen from Table 1, the traditional LRT always rejects Ho when p is large, like p = 100 or 300, 
while the sizes produced by the corrected LRT perfectly matches the nominal level. For moderate 
dimensions like p = 50, the corrected LRT still performs correctly while the traditional LRT has 
a size much higher than 5%. 

4. Testing the equality of two high-dimensional covariance matrices 

Let (xj), i = 1, ••• ,ni and (yj), j = l,--- ,ri2 be observations from two normal populations 
iV(/Ltfc,Sfc), k = 1,2, respectively. We examine the test defined in (1.5) and (1.6). The aim is to 
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find a good scaling of the LR statistic Tjy, such that the scaled statistic weakly converges to some 
limiting distribution. Let 

& = S"5(xi -m), rn = E"^ -fj, 2 ) 

where E = Si = E 2 denotes the common covariance matrix under Hq. Note that in a strict 
sense, the vectors (Xj), (yA and the matrices E,E!,E 2 depend on p. However we do not signify 
this dependence in notations for ease of statements. Due to Gaussian assumption, the arrays 
(£»)»=i,... ,m and {Vj)j=i,~ -«2 contain i.i.d. N(0, 1) variables, for which we can apply Theorem 2.2. 
Let 

m 



Si = — Y>£* =E-iCE-i 



where 



c = 



D = 



3=1 



l i ^"~T 

z— 1 

— 5Z(yj -M2)(yj -^2)*- 



n 2 . 



Note that 



forms a random F-matrix and wc have 



^1 - ~ = W- I 4 - 1 ] 

|ci5i + pa5 3 r |ciC + c 2j D|- 
Theorem 4.1. Assuming that the conditions of Theorem 2.2 hold under Hq, Li as defined in 
(1.6) and 

f(x) = log(y rai + y n2 x) logx - log(j/ ni + y„ 2 ). 



Then, under Hq and as n\ A n 2 — > 00 

21ogL 



Tat = 



1 --p-Fy ni ,y n2 {f)-m(f) 



TV (0,1). (4.2) 



iV 

Proof. As A — C and B — D are rank-1 random matrices, AB' 1 and CD' 1 have the same LSD. 
Also by (4.1), Li and Li have the same asymptotic distribution. Because 

2, r 2 / 1^1^ -|s 2 | ¥ > \ 

-— logij = — j^log - 

N N \\c 1 S 1 + c 2 S 2 \^J 

= logldF-^cal-d-loglF" 1 ! 
p 

= ]T lo s( c i ^ + c 2 ) - Cl ■ log(Af" ) 

i=i 
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= P I [logfciz + c 2 ) - ci ■ log(x)] dF%» (x). 



Define f(x) = log(cix + C2) — c\ ■ log(a;), by C\ 
be written as 



ni _ Vn 



N y nl +y„ 



and C2 



n 2 _ Vn 



N y nl +y„ 



11 



also it can 



f(x) = log(y rai +y n2 x) 



2M 2 



2/ni ~r Vn 2 



logx-log(y ni +y n2 ). 



(4.3) 



From 



^ = P- lf{x)dFX^x) 



V \ f(x)d(F^(x) - F yni , yn2 (x)) +p- F yniiyn2 (f), 



we get 



G„(f) 



21ogi 



N 



1 -P-Fy ni , yn2 (f). 



By Theorem 2.2, G n (f) weakly converges to a Gaussian vector with mean 



*(/) 



1 



log 



2/i + 2/2 - 1/12/2 
2/i + 2/2 



log(l - 2/2) v — log(l - 2/1) 



and variance 

v(f) = 



■log(l-yi)- 



2/1 + 2/2 



22/? 



2/1 + 2/2 



log(l - 2/2) - 2 log- 



2/1 + 2/2 



(4.4) 



(4.5) 



(4.6) 



(2/1+2/2) 2 (2/1+2/2) 2 " v "2/1+2/2-2/12/2 

for the real case, which are calculated by Lemma 2.1 in §5. For the complex case, the mean m(f) 
is zero and the variance is half of v(f). In other words, 

21ogZ^ 



N 



--P-Fy ni ,y n2 (f) N(m(f),v(f)), 



(4.7) 



where 



Fy ni ,y„ 2 if) 



-(ym +Vn 2 - y ni y n2 ) 



log (2/ ni 1 2/n2 yniUn 2 ) 



\Un-i 2/"2 Un 1 yn 2 ) ■, / \ . Vn\ (1 2/«2 ) 1 \ 
log (j/ ni + t/„ 2 ) H ? ; r log (1 - 2/„ 2 ) 



2/ni2/n2 
2/«2( 1 - 2/ni) 



2/n 2 (2Mi + 2/n 2 ) 



log(l-y ni ), 



2/ni(2/m +2M 2 ) 

is derived by use of the density of F VniiVn2 in §5. Because Liand L\ have the same asymptotic 
distribution and by (4.7), we get by letting n\ A tt-2 — > oo, 

~ P ■ F Vni . yn2 (/) - m(/)j =* JV (0, 1) . 



r N = «(/)-* 



□ 
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n=500 




Dimension 



Figure 1. Realized sizes of the traditional LRT and the corrected LRT for different dimensions p with real Gaussian 
variables. 10 000 independent runs with 5% nominal level and sample size n = 500. 



(yl, y2)=(0.05, 0.05) 






CLRT 




LRT 




(p, ni, ni ) 


Size 


Difference with 5% 


Power 


Size 


Power 


(5, 100, 100) 


0.0770 


0.0270 


1 


0.0582 


1 


(10, 200, 200) 


0.0680 


0.0180 


1 


0.0684 


1 


(20, 400, 400) 


0.0593 


0.0093 


1 


0.0872 


1 


(40, 800, 800) 


0.0526 


0.0026 


1 


0.1339 


1 


(80, 1600, 1600) 


0.0501 


0.0001 


1 


0.2687 


1 


(160, 3200, 3200) 


0.0491 


-0.0009 


1 


0.6488 


1 


(320, 6400, 6400) 


0.0447 


-0.0053 


0.9671 


1 


1 



(yl, y2)=(0.05, 0.1) 






CLRT 




LRT 


(p, ni, ni ) 


Size 


Difference with 5% 


Power 


Size 


Power 


(5, 100, 50) 


0.0781 


0.0281 


0.9925 


0.0640 


0.9849 


(10, 200, 100) 


0.0617 


0.0117 


0.9847 


0.0752 


0.9904 


(20, 400, 200) 


0.0573 


0.0073 


0.9775 


0.1104 


0.9938 


(40, 800, 400) 


0.0561 


0.0061 


0.9765 


0.2115 


0.9975 


(80, 1600, 800) 


0.0521 


0.0021 


0.9702 


0.4954 


0.9998 


(160, 3200, 1600) 


0.0520 


0.0020 


0.9702 


0.9433 


1 


(320, 6400, 3200) 


0.0510 


0.0010 


1 


0.9939 


1 



Table 2 

Sizes and powers of the traditional LRT and the corrected LRT based on 10,000 independent replications using 
real Gaussian variables. Powers are estimated under the alternative = diag(3, 1, 1, 1, ■ ■ ■ ). Upper: 

VI = U2 = 0.05. Bottom: y\ = 0.05, J/2 = 0.1. 
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For different values of (p, 711,712), we compute the realized sizes of the traditional LRT and the 
corrected LRT with 10,000 independent replications. The nominal test level is a = 0.05 and we 
use real Gaussian variables. Results are summarized in Table 2 and Figure 2. 

As we can see, when the dimension p increases, the traditional LRT leads to a dramatically 
high test size while the corrected LRT remains accurate. Furthermore, for moderate dimensions 
like p = 20 or 40, the sizes of the traditional LRT are much higher than 5%, whereas the ones 
of corrected LRT are very close. By a closer look at the column showing the difference with 5%, 
we note that this difference rapidly decrease as p increases for the corrected test. Figure 2 gives a 
vivid sight of these comparisons between the traditional LRT and the corrected LRT in term of 
test sizes. 



4-2. A pseudo-likelihood test for high- dimensional non-Gaussian data 

As said in Introduction, previous related works as Lcdoit and Wolf (2002), Srivastava (2005) 
or Schott (2007) all assume Gaussian variables. In contrast, Theorem 4.1 applies for general 
distributions having a fourth moment. For these non Gaussian data, we consider the corrected 
LRT as generalized pscudo- likelihood ratio test (or Gaussian LRT). 

Moreover, the methods proposed by these authors all rely on an appropriate normalization of the 
trace of squared difference between two sample covariances following the idea of Bai and Saranadasa 
(1996). We believe that their method would strongly depend on the normality assumption (which 
was supported by simulation results below). On the other hand, based on general understanding, 
the LRT contains much higher information from data and its poor performance observed up to now 
is just caused by its large bias when dimension is large. Thus, from the intuitive understanding, 
we are confined ourselves to modify the LRT. 

Let us develop in more details an example. Assume that x follows a normalized t-distribution 
with 5 degree of freedom, that is x = y^|i(5), x and y are i.i.d., hence -Ex = Ey = 0, E\x\ 2 = 
_E|y| 2 = 1 and -E|x| 4 = i?|y| 4 = 9. We still employ the result in Theorem 4.1 for the test of equality 
between two covariance matrices, where 



mi(f) 



log 



m + 2/2 - 2/12/2 
yi +y2 



yi 



Vl + 2/2 



l0g(l - V2) 



yi 



y\ + y% 
62/1 2/f 



log(l-yi) 



62/12/2 

_l_ 

(2/1 + 2/2) 2 (yi + y2f 



(4.8) 



and 



Mf) 



2y 2 2 



(yi + yi)'- 



■log(l-yi) 



2y! 



(yi + z/2) : 



log(l - y 2 ) - 2 log 



Vi + 2/2 



2/1 + 2/2 - 2/12/2 



(4.9) 
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instead of m(f) and v(f) for real case, respectively. (4.8) and (4.9) are calculated in §5. 

The following Tabic 3 summarizes a simulation study where we compare this corrected pseudo- 
LRT with the test proposed in Schott (2007). We use 1,000 independent replications with the 
above ^distributed variables. Again, the nominal test level is a = 0.05. As we can see, the corrected 
pseudo-LRT performs correctly while Schott 's test is no more valid here since the variables are 
not Gaussian. 



(yl, y2)=(0.05, 0.1) 


(p, m, Ha ) 


CLRT Size 


Schott's Size 


(10,100, 200) 


0.067 


0.517 


(20, 200, 400) 


0.065 


0.603 


(40, 400, 800) 


0.054 


0.703 


(80, 800, 1600) 


0.048 


0.764 


(160, 1600, 3200) 


0.045 


0.826 


(320, 3200, 6400) 


0.051 


0.854 



Table 3 

Sizes of the corrected pseudo-likelihood ration test and Schott's test for the case of yi = 0.1, 3/2 = 0.05, based on 
1,000 independent replications with normalized t-distributed variables with 5 degrees of freedom. 



5. Proofs 



Proof of (3.3) 

By Theorem 2.1, for g(x) — x — logx — 1, by using the variable change x = 1 + y — 2^/y cos 8, < 
6 < 7T, we have 

m{g) = 9 jam±imi 1 f Hy) m 



1 f [l + y-2y/ycos6-log{l + y-2y/ycos6)-l]de 
Jo 

[y-2^cos0-log|l-V^T] d9 

Jo 



2 2tt 



y - log(l - y) _ J_ 
2 4^7, 
log(l - y) 



where / log 1 1 — y/ye^^dO = is calculated in Bai and Silverstein (2004). 
Jo 



Proof of (3.4) 



For g{x) = x — log a; — 1, by Theorem 2.1, we have 



2it 2 J J (m{zi) - m(z 2 )) 2 
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and 

9(zi)g(z 2 ) = z x z% - z 1 \ogz 2 - z 2 \ogz 1 + logzilogz 2 
—Z\ + log z\ - z 2 + logz 2 + 1. 

It is easy to see that v(l, 1) = 0, where 1 means constant function equals to 1. For Stieltjes 
transform of F y , the following equation is given in Bai and Silvcrstein (2004), for z <G C + , 

i y 



m(z) 1 + m(z) 



(5.1) 



Let rrii = m(zj), i = 1,2. For fixed m 2 , we have on a contour enclosed 1, (y — 1) 1 and -1, but 
not 0, 

log(z(mi)) .d mi = M'O+^F 1 



{mi - m 2 y J -A- + ^JL- (mi - ma) 

(1 + mi) 2 — ?/m 2 / — 1 1 



ymi(nii-m 2 ) lmi + 1 mi 
2tti • 



— dmi 



m 2 + 1 m 2 — — 



and 



mi 1+mi 

(mi - m 2 ) 2 



dm i 



2?ri • 



3=0 1=1 

y 



(m 2 + l) 2 - 

Then wc also get v{—z\ + logzi, 1) = 0. Similarly, v(l, —z 2 + logz 2 ) = 0. Furthermore, 

v{z u z 2 ) = V -l- L— (-J— + ^— JL )Y J {l + m 2 ydm 2 =2y, 

m J (m 2 + iy l + m 2 y ^— ' 

3=0 

and 

v{z u \o gZ2 ) = -LL—L- -L — )(^_ + i^).[i_(i +ma )]-i dma 

7TI 7 m 2 + 1 7712 — 1/(^ — 1) 1 + 7712 y 

= ~ — TT t^^)(t^ + — )f> + ™ 2 )^ 2 

7ri 7 m 2 + 1 m 2 -l/(y-l) l + m 2 y ^ 
= 2y. 

By a computation in Bai and Silvcrstein (2004), we know that w(logzi,logz 2 ) = — 21og(l — y). 
Finally, we obtain 



v(g) = v(z 1 ,z 2 ) + w(logzi,logz 2 ) - 2v(z 1 ,\ogz 2 ) 
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+V(-Zi + logZi, 1) + V(l, -2 2 + \0gZ 2 ) + v(l, 1) 

-21og(l-i/)-2y. 



Since i^™ is the Marcenko-Pastur law of index y n , by using the variable change x = 1 + y n 
2y/y~n~cos6, < 9 < n we have 



1 



- log x — 1 
2irxy n 



V (KVn) - x)(x - a(y n ))dx 



1 

1 - 



7ry« Jo 

2tt 



log(l + y„ - 2y/y^ cos 9) + 1 



1 + y n - 2^y^cos9 



4y„ sin 9d9 



2 sin 2 9 



2 sin 2 6» 



o 

Vn - 1 



where 



1 



Vn 
2 sin 2 (9 



1 + y n - 2^Jy~cos9 
log(l-y„), 



-(log|l-^ e | 2 -l 



27T Jo 1 + J/„ - 2^?M cos 

is calculated in Bai and Silverstcin (2004) 



log |1 - yfre»\ 2 dB = V -^- log(l - y n ) - 1 



Proof of Lemma 2.1 

We use the variable change x = (1 — j/2 ) 2 ( 1 + h 2 — 2hcos9), where h = y/yl + y~2 — 2/12/2. When 
c, d satisfy c 2 + d 2 = a(l - t/ 2 ) 2 + b(l + h 2 ), cd = bh, < d < c, we have 

|2 \ 



Similarly, 



Let 



f(z(0) = log(o + 6z(0) = log 



ff (z(0) = log(a + /?*(£)) = log 



(i-y 2 ) 2 



(i-y 2 ) 2 



/(*(0) = log 



(c + dO 2 



and ff(z(f)) = log 



x (l-J/ 2 ) 2 ; yV VS;; b \(l-y 2 ) 2 J- 

Note that /(>(£)) = &(/(>(£))) and s(z(f)) = 8*(<jr(z(£))). By Theorem 2.2, we have 
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8?r Jo 



/(*(e*)) 



-id I 1 e -iB i V2_ 
' r hr 



3 -' e Ue 



,i$ _ i P i0 I i pie _i_ M 

r e + r ^ hr 



2hr 



r _ e i0 r e i9 y 2 e lB + hr 



8m 



mo) 



|SI=i 
r r 



1 



dO 
1 



£ r £ r £ ^ hr 

2hr 



r — £ r + £ V2£, + hr 



c 1 



K/wJ)) + /W-J))-2/W-g)) 



r| 1 



/(*(!)) + /(z(-l))-2/(z(-|)) 



1 (c 2 - rf 2 )fr 2 

2 ° g (c/i - y 2 d) 2 ■ 

Let rrij = — , where = 1, j = 1,2, r 2 i ri, and n J. 1. By Theorem 2.2, we have 



V U,9) = -7T-T 



2tt 2 7| e2 |=i 1 2 6 - r^if 



rir 2 d£i > g(z(r 2 &))dfr 



When nil, 
we get 



and are poles. We can then choose r\ so that — is a not a pole. Then 



log(a + fcz(n£i)) 

— 2— • rir 2 dt_i 



| 5l |=i (r 2 6-n6) 

(log(a + Mrigi)))' 
|f!|=i ri£i-r 2 £2 

bhr^x 



r 2 d£i 



(n&-r 2 £ 2 )(c + dn£i)c + JL 
bhr^ 1 1 



2tti 



(r-i6-r 2 6)(c + rfrxa)c (£1 + 



fbhd^c- 1 bhd" 1 ^ 



• r 2 



da 



V 6 d + cr 2 £ 2 



So, 



>V,9) = — 



lfo|=x 



1 6/id- 1 r 2 
^ 2 d + cr 2 £ 2 



log (a + /30(r 2 ^2))^2 



Since the function y(x) = log (a + (3x) is analytic, when r 2 > 1 but sufficiently close to 1, we have 

\g(z(rt 2 ))-g(z(t 2 ))\<K(r-l), 
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for some constant K. Thus we have 

/ W<r&)) - *(*(6))] ( bJ ^ - ^Pf] d6 



'lf|2 

as r 2 | 1, 

where the estimations are done according to| arg(£ 2 )| or | arg 



- tt| < \ZF 2 _ ~T or not. Thus, 



v(f,g) = --f 5 (z(6)) 



bhd- 1 ^ 1 bhdr x r 2 \ „ 

— 3+5s)* + fl O' 



/u „£ 2 1 2 \ 

where R(r 2 ) — > 0, as r 2 j 1. Because <?(z(6)) = log -7 r^- , for 7, 77 satisfying 7 2 + n 2 = 

2 + 0(1 + h 2 ), 777 = 0h, < 77 < 7, and if g(z{&)) = log ( ^t^ \ , we have 



a(l - 2/2) 

3(2(6)) = 5R (3(2(6)))- Therefore 

. xx (bhdr 1 ^ 1 bhdr 1 ^ 
s(*(6)) 



v(f,g) = — 

n -/|{| a =i 



1 



,.27T 



v 6 d + cr 2 £ 



/ , ,■»,, ("III c 



1 



,•271 



tt Jo \ e w d + cr 2 e 



1 r 27r 



bhd- x c- x bhd- 1 r 2 \ , fl ,, , 1 , 

=3 ; n z + bhd- 1 ^ 1 - —„ 

V e lB d + cr 2 e l8 J de l6 + cr 2 



bhd- x r 2 



1 



= 4^/ 5^(6)) 

1 * ^lei2=i 



+ cr 2 e 
d + cr 2 e 



dO 



~i ( i0^\ fbhd^c- 1 bhd- l r 2 \ l6 t _j bhd- l r 2 



fbhd^cr 1 bhd' x r 2 
\ 6 + cr 2 6 



bhd^c" 1 



bhd- X T 2 

e?6 + cr 2 



d6 



<*6 



= bhd- 1 ^ 1 



bhd^c' 1 



g( Z (Q))-g(z(- — )) 

cr 2 

g( Z (Q))-g(z(--)) 
c 



= 2bhdr x c- 1 log 



<:-, 



cy — drj 



Proof of (4.5) and (4.6) 

Because £ and 77 are Gaussian variables, for real case, = — 3 = 0, then (2.10), (2.11) and 

(2.13) are all 0. Consider (2.9) and (2.12), as y nk — > yk, k = 1,2,, by the computations done in 
the proof of Lemma 2.1, we see that termes tending to zero could be neglected in the considered 
contour integrals. Hence we can put y„ k = y^, k = 1, 2 and use 



f(x) = log(y! + y 2 x) 



1)2 



Vi + V2 



logx - log(j/i + y 2 ) 
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2/n 2 



instead of f(x) = log(j/ ni + y n2 x) 



■ log a; — \og(y ni + y n2 ). Consider the variable change 



Vni T Hn 2 

x= (l-y 2 )- 2 (l+/i 2 -2/icos0), where = (l-y 2 )~ 2 [1 + /i 2 + 2hH(£)] , h = + y 2 - Vl y 2 . 
As 



log(ym +yn 2 ^(0) = log 



log(z(0) = log 



\h + y^r ) 
(1-2/2) 2 

' jl + /^l 2 \ 

(i-y 2 ) 2 r 



we have by Lemma 2.1, 



and 



log 
log 



2/2 , (l-/l 2 )/l 21 

■log 



(h 2 -y 2 ) 2 yi+y 2 {h-y 2 h) 2 
2/1 + 2/2 - 2/1 J/2 



2/i + 2/2 



log(l - 2/ 2 ) — log(l - 2/1) 



2/1 + 2/2 



>(/) = w(log(y ni + i/n 2 a;)) + 



2/1 



(2/i + 2/2) 5 



'(log: 



22/2 



2/1 + 2/2 



2/i + 2/2 



u(logx,log(2/ ni + y n2 x)) 



2 log tt; — + 2- log . 



42/2 . 1 
log- 



h 2 - vl (2/1 + 2/2) 2 1 - /i 2 2/1+2/2 1-2/2 



22/1 



(2/1 + 2/2) : 



log(l - 2/1) 



22/? 



(2/1 + 2/2) : 



log(l - 2/2) - 2 log 



2/1 + 2/2 



2/1 + 2/2 - 2/12/2 



Proof of F yni , Vn2 (f) 

By (4.3) and the density of F Vni ^ Vrlr> (/) (the limiting distribution in (2.7) but with y nfc in place of 
2/fc, fc = 1, 2. ), where 7i„ = V2/m + 2M 2 = Un 1 yn 2 , a n = (1 - 2/„ 2 )~ 2 (1 - V^n + 2M 2 = 2Mi2M 2 ) 2 and 
& n = (l-2/n 2 ) -2 (1 + + 2M 2 - 2/m2/™ 2 ) 2 ■ Using the substitution x = (l-y„ 2 )~ 2 (l + ft, 2 - 2/i„cos( 
< -k, we have 



a/ (b n - x)(x - a„) 



2/i ra sin 

= (l-2/n 2 ) 2 

\l-Ke w [ 

(l-2/n 2 ) 2 



da; 



2/rii "T yn 2 X 



2h n sin #c?# 

(i-^r 

I h n Uri 2 ^ 
(l-2/n 2 ) 2 



iO I 



Therefore, 



< 



b " y./ x (1 ~ 2M 2 )V(fr» - - a„) & 
2nx(y ni + y„ 2 x) 



(1 - 2/n 2 ) 



l°g(2/ni + 2M 2 a 



2/n 2 



2/ni ~r 2/n 2 



log a; 



>/(b n - x)(.t - a n ) 
2irx(y ni + y 7l2 x) 



d.v 



-l0g(j/ ni +2/n 2 ) 
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2(1 - Vn 



log 



Un 2 , \l-h n e 
log 



it)\ 



I t^n Vn 2 & ' | 

(1 - Vn 2 ) 2 Vm + Vn 2 ~ & (1 - Una) 



til sin 2 6 . 
—j- — 2 dO - log (y ni + y n2 ) 



2(1 - Vn 



log | h n ~ Vn 2 e W | 2 V -f log 1 1 - h n e v 

Vn! + Vn 2 



til sin 2 I 



|l-/i n e ie | 2 |/i„-y„ 2 e*f 



dff-2 1- 



y« 2 



log(l - y„ 2 ) - log (y m + y n2 ) 



= 3?< 



2(1 - V 



"2 y 



2tt 



7T JO 

/i? sin 2 6> 



log(/»„ - y n2 e l6 ) V -^— log(l - ft„e ie ) 



|l-/i„e*f|/i n -y n2 e*>| 2 



d0 



2y ni 



Uni ~T~ Vn 2 



Vni + Un 2 

log(l - y„ 3 ) - log (y ni + y„ 2 ) 



(1 - Vn 2 ) 

2m 



\z\=i L 



log(/i„ - y„ 2 z) 



y«2 



log(l - h n z) 



hKz-z- 1 ) 2 _ dz \ 2y ni 



z |l - h n z\ 2 \h n - y n2 z\ 2 | y ni +yn 2 



log(l -y n2 )~ log (2/ ni + 2M 2 ) 



2fa 2 - 1 1 

Vn 2 2ni J\ z \ = i 



log(/i„ - y„ 2 2) 



2M 2 



J/ni "I" 2M 2 



log(l - h„z) 



(z 2 I) 2 



-dz 



2y ni 



log(l -y n2 )- log (y ni + y n2 )- 



There are three poles inside the unit circle: 0, h n ,y n2 /h n . Their corresponding residues are 

Vn 2 - 1 



R(0) 

R(h n ) 



Therefore, 



log(/ln)i 



Vn 2 

(h 2 n - 1) 

(yl 2 -K) 

Vn 2 



log(^n) + log(l - y n2 ) 



Vn 2 



Vni T Vn 2 

Vn 2 



log(l - /* 2 ) 



log(^ 2 - y 2 J - log(ftn) ^— log(l - j/„ a ) 



hn Vni Hn-2 

(y ni + yn 2 - y ni y n2 ) 



Vn i T" y?i2 



log(l-2/ na ) -log(y ni +y n2 ) 



yn±yn 2 
(Vm +y n2 -y ni y n2 ) 

Vn\yn 2 



log (y m i 2M 2 yn±yn 2 ) 



log (y ni + y„ 2 ) + yna) log (1 - 2M 2 ) 
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2/n 2 (l ~ 2fal) 

y ni (y ni +y n2 ) 



log(l - y ni ). 



Proof of (4.8) and (4.9) 



Because x and y are random variables from normalized t-distribution with 5 degree of freedom, 
x and y are i.i.d., Ex = Ey = 0, E\x\ 2 = E\y\' 2 = 1 and E\x\ 4 = -Elyl 4 = 9. For real case, 
(3 = E\^\ 4 — 3 = 6, (2.9) and (2.12) items are the same to the Gaussian variables. Consider 
the items (2.10), (2.11) and (2.13). As the same explanation in Proof of (4.5) and (4.6), we use 
f(x) = log(j/i + y2x) - -^f^ logx - log(yi + y 2 ) instead. 
For (2.10), we have 



2m ■ h 2 



P-yi{l-y2? 
2m ■ h 2 

2m ■ h 2 



l?l=i 



log 



y2 . ii + ^i 2 , , , ; 



(1-2/2) 2 2/1 + 2/2 (1-2/2) 



I 21l\ \og(h + y 2 
J\f\=i L 



1)2 



hr 

1 



ici=i <■ yi + V2 

{ \og(h + y 2 + log(/i + y 2 
l«l=i I 



log{1 + h °}-WTW d ' 



P-yi(l-y2) 2 

2m ■ h 2 



p-yi(l-y2) 2 

2m ■ h 2 

!3-yi(l-y2) 2 
2h 2 

Pyjy2 

2(yi + y 2 ) 2 ' 
For (2.11), we have 

/?• (1-2/2) 

47TZ 



2/2 



2/1 + 2/2 



log(l + + log(l + 



(£+If) 3 ^ 



l€l=i 



\og{h + y 2 i) - 



1)2 
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So, (2.13) becomes, 
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Figure 2. Sizes of the traditional LRT and the corrected LRT based on 10,000 independent replications using real 
Gaussian variables. Left: yi = y2 = 0.05. Right: y\ = 0.05, j/2 = 0.1. 



